Unit roots

By Evgenia "Jenny" Nitishinskaya and Delaney Granizo-Mackenzie

Notebook released under the Creative Commons Attribution 4.0 License.


A time series is said to have a unit root if it is described by the equation $y_t = b_0 + y_{t-1} + \epsilon_t$, where $\epsilon_t$ is an error term with mean 0. Such a time series is not covariance stationary, i.e. its covariance changes through time. For example, a random walk has a unit root, and its covariance grows over time.

The Dickey-Fuller test (available in statsmodels) tests for the presense of a unit root in a series.


In [400]:
from statsmodels.tsa.stattools import coint, adfuller
import pandas as pd

fundamentals = init_fundamentals()
data = get_fundamentals(query(fundamentals.income_statement.total_revenue)
                        .filter((fundamentals.company_reference.primary_symbol == 'MCD') |
                                (fundamentals.company_reference.primary_symbol == 'MSFT') |
                                (fundamentals.company_reference.primary_symbol == 'KO')),
                        '2015-01-01', '30q')

# Get time series for each security individually
x0 = data.values[0].T[1]
x1 = data.values[0].T[2]
x2 = data.values[0].T[0]

In [404]:
print 'p-values of Dickey-Fuller statistic on total revenue data:'
print 'PEP:', adfuller(x0)[1]
print 'KO:', adfuller(x1)[1]
print 'MSFT:', adfuller(x2)[1]


p-values of Dickey-Fuller statistic on total revenue data:
PEP: 0.0941016310217
KO: 0.864822275634
MSFT: 0.583291939879

Since $p > 0.05$, we cannot reject the hypothesis that the series has a unit root in any of these cases.

Regression when a time series has a unit root

When we use multiple time series in a regression model (whether as dependent or as independent variables), we must test for the presence of unit roots.

If none of the series involved have unit roots, we can proceed as usual, and our regression analysis will be valid.

If some of the series have unit roots and some do not, the error term in the regression will not be covariance stationary, causing a violation of at least one of the following regression assumptions: that the expected value of the error term is 0, that the error term is homoskedastic, or that the error term is not autocorrelated. Then the regression coefficients and standard errors will be inconsistent, and may falsely appear to be significant. We should not use linear regression in this case.

If all series have unit roots, we need to check for cointegration - that is, whether the error term in the regression is stationary. We'll discuss cointegration below, but for now let's go over the implications. If the time series are not cointegrated, we have the same problems as before with nonstationary error terms. If they are cointegrated, however, the regression coefficients and standard errors will be consistent, and we can use ordinary least-squares regression. However, there also exist finer models for cointegrated time series.

Cointegration

Two or more time series are cointegrated if the error term in the regression is stationary. Intuitively, this means that they do not diverge arbitrarily. In practice, this often means that there is a relationship between the series that causes them to move in tandem. We can test for cointegration by checking if the error term has a unit root, or just use the test implemented in statsmodels.


In [405]:
# Compute the p-value for the cointegration of the two series
print 'p-values of cointegration statistic on total revenue data:'
print 'MCD and MSFT:', coint(x0, x1)[1]
print 'MCD and KO:', coint(x0, x2)[1]
print 'MSFT and KO:', coint(x1, x2)[1]


p-values of cointegration statistic on total revenue data:
MCD and MSFT: 0.105977174014
MCD and KO: 0.0151732358446
MSFT and KO: 0.15076649237

From the statistics, we can see that MCD and KO are cointegrated, while the other two pairs are not. Therefore, it would be valid to run a linear regression on MCD and KO only. This supports our claim that cointegrated series have an underlying economic relationship, since many of the same factors affect McDonald's and The Coca-Cola Company revenues, but not Microsoft's.